Data Source

The data here are a re-creation of the 2016 analysis from Vox, which Vox originally modeled off of the Washington State Department of Health’s analysis. They were derived from tract-level Census Bureau data for “poverty status in the last 12 months” (Table S1701) and “year structure built” (Table B25034). The original analysis used 2014 ACS data, but my analysis here uses the most recent ACS data, from 2019.

How the Data Were Constructed

The final product in this dataset is a lead risk score ranging from 1-10, 1 indicating very little lead exposure risk and 10 indicating very high lead exposure risk. These lead risk scores take into account both the age of housing and poverty status, which have both been shown to influence the risk of lead poisoning in children. (There are other factors that influence lead exposure, but it is often hard to get good data on those things.)

First, I imported the poverty and housing data from the US Census Bureau for all 50 states. Then I removed all census tracts that had missing observations for either number of households or number of individuals for whom poverty status is determined.

Then I created five categories for housing risk:

  • age_39, which is the number of households built before 1940, multiplied by 0.68
  • age40_59, which is the number of households built between 1940 and 1959, multiplied by 0.43
  • age60_79, which is the number of households built between 1960 and 1979, multiplied by 0.08
  • age80_99, which is the number of households built between 1980 and 1999, multiplied by 0.03
  • age00_plus, which is the number of households built in 2000 or later, multiplied by 0

The numbers that each category is multiplied by indicate the percentage of households in that age range that contain lead paint, as established by Jacobs et al (2002).

I then calculated the overall housing risk for each census tract by adding together the five age categories and dividing that by the total number of households.

To calculate poverty risk, I divided the number of people under 125% of the poverty line by the total number of people.

Then I standardized the housing and poverty risks by calculating z-scores for each. After standardizing them, I created a weighted lead risk, with housing weighted at 0.58, and poverty weighted at 0.42. Then I added the weighted lead risks to create a combined lead risk. Those combined risks were then split up into deciles to create the lead risk scores from 1-10.

All of this was done at the national level first and then restricted to the Charlottesville area, so all lead risk scores are relative to the rest of the country.

Variable Descriptions

glimpse(leadrisk)
## Rows: 50
## Columns: 19
## $ GEOID                 <dbl> 51065020200, 51003010201, 51003010202, 510030104…
## $ NAME                  <chr> "Census Tract 202, Fluvanna County, Virginia", "…
## $ age_39                <dbl> 374.00, 47.60, 87.72, 234.60, 27.20, 17.00, 7.48…
## $ age40_59              <dbl> 132.44, 8.60, 14.19, 102.34, 6.45, 8.17, 3.01, 3…
## $ age60_79              <dbl> 36.24, 36.56, 36.64, 39.52, 30.88, 72.72, 11.60,…
## $ age80_99              <dbl> 18.72, 28.08, 15.39, 17.76, 19.77, 41.82, 26.31,…
## $ age00_plus            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ total                 <dbl> 2297, 1995, 1359, 2137, 1512, 2812, 2000, 4186, …
## $ sum_housing_risk      <dbl> 561.40, 120.84, 153.94, 394.22, 84.30, 139.71, 4…
## $ housing_risk          <dbl> 24.440575, 6.057143, 11.327447, 18.447356, 5.575…
## $ total_poverty_status  <dbl> 4804, 4790, 3093, 4167, 3000, 6095, 4297, 7017, …
## $ total_under_125pct    <dbl> 439, 114, 149, 388, 103, 1176, 452, 901, 1166, 8…
## $ poverty_risk          <dbl> 9.138218, 2.379958, 4.817329, 9.311255, 3.433333…
## $ z_housing_risk        <dbl> 0.2870704, -0.9607426, -0.6030100, -0.1197315, -…
## $ z_poverty_risk        <dbl> -0.734216628, -1.228102854, -1.049982438, -0.721…
## $ weighted_housing_risk <dbl> 0.16650086, -0.55723072, -0.34974578, -0.0694442…
## $ weighted_poverty_risk <dbl> -0.308370984, -0.515803199, -0.440992624, -0.303…
## $ leadriskscore_raw     <dbl> -0.14187013, -1.07303392, -0.79073840, -0.372504…
## $ lead_risk_rank        <dbl> 5, 1, 2, 4, 1, 3, 1, 2, 2, 3, 5, 5, 8, 10, 6, 5,…
meta %>% 
  mutate(label = paste0(varname, ": ", about)) %>% 
  select(label) %>% 
  as.list()
## $label
##  [1] "GEOID: 11-digit FIPS census tract code"                                                                                       
##  [2] "NAME: Tract number, county, and state"                                                                                        
##  [3] "age_39: Number of houses built before 1940, multiplied by 0.68"                                                               
##  [4] "age40_59: Number of houses built between 1940 and 1959, multiplied by 0.43"                                                   
##  [5] "age60_79: Number of houses built between 1960 and 1979, multiplied by 0.08"                                                   
##  [6] "age80_99: Number of houses built between 1980 and 1999, multiplied by 0.03"                                                   
##  [7] "age00_plus: Number of houses built in 2000 or later, multiplied by 0"                                                         
##  [8] "total: Total number of houses in the census tract"                                                                            
##  [9] "sum_housing_risk: Total number of houses that contain lead paint"                                                             
## [10] "housing_risk: Percentage of houses that contain lead paint"                                                                   
## [11] "total_poverty_status: Total number of people in the census tract (for whom poverty status is determined)"                     
## [12] "total_under_125pct: Total number of people at or below 125% of the poverty line"                                              
## [13] "poverty_risk: Percentage of people at or below 125% of the poverty line"                                                      
## [14] "z_housing_risk: Standardized housing risk (z-score of housing_risk)"                                                          
## [15] "z_poverty_risk: Standardized poverty risk (z-score of poverty_risk)"                                                          
## [16] "weighted_housing_risk: Weighted housing risk (z_housing_risk*0.58)"                                                           
## [17] "weighted_poverty_risk: Weighted poverty risk (z_poverty_risk*0.42)"                                                           
## [18] "leadriskscore_raw: Raw lead risk score (weighted_housing_risk + weighted_poverty_risk)"                                       
## [19] "lead_risk_rank: Lead risk rank on a scale of 1-10, with 1 indicating very low lead risk and 10 indicating very high lead risk"

Summaries

leadrisk %>% select(-c(GEOID:NAME)) %>% 
  select(where(~is.numeric(.x) && !is.na(.x))) %>% 
  as.data.frame() %>% 
  stargazer(., type = "text", title = "Summary Statistics", digits = 0,
            summary.stat = c("mean", "sd", "min", "median", "max"))
## 
## Summary Statistics
## =======================================================
## Statistic             Mean  St. Dev. Min Median   Max  
## -------------------------------------------------------
## age_39                 131    123     0   89.1    496  
## age40_59               89      80     0    69     409  
## age60_79               40      24     3    35     139  
## age80_99               23      16     2    21      78  
## age00_plus              0      0      0     0      0   
## total                 2,260   984    148  2,128  4,626 
## sum_housing_risk       283    171     4    253    728  
## housing_risk           13      8      2    11      33  
## total_poverty_status  4,785  2,089   287 4,731.5 10,421
## total_under_125pct     731    558    103   637   3,289 
## poverty_risk           18      17     2    13      95  
## z_housing_risk         -0      1     -1    -1      1   
## z_poverty_risk         -0      1     -1    -0      6   
## weighted_housing_risk  -0      0     -1    -0      0   
## weighted_poverty_risk  -0      1     -1    -0      2   
## leadriskscore_raw      -0      1     -1    -0      2   
## lead_risk_rank          4      3      1     4      10  
## -------------------------------------------------------

Visual Distributions

leadrisk %>% 
  ggplot() +
  geom_histogram(aes(x=lead_risk_rank)) +
  scale_x_continuous("Lead risk rank", breaks=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), labels=c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10))

leadrisk %>% 
  ggplot() +
  geom_point(aes(x=housing_risk, y=poverty_risk)) +
  xlim(0, 100) +
  ylim(0, 100)

Spatial Distributions

Lead Risk Rank

pal <- colorFactor("viridis", reverse = TRUE, domain = leadrisk$lead_risk_rank)
leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(lead_risk_rank),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Lead Risk Rank: ", leadrisk$lead_risk_rank)) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$lead_risk_rank,
            title = "Lead Risk Rank", opacity = 0.7)

Housing Risk

pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$housing_risk)
leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(housing_risk),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Housing Risk: ", round(leadrisk$housing_risk, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$housing_risk,
            title = "Housing Risk", opacity = 0.7)

Poverty Risk

pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$poverty_risk)
leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(poverty_risk),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Poverty Risk: ", round(leadrisk$poverty_risk, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$poverty_risk,
            title = "Poverty Risk", opacity = 0.7)

Raw Lead Risk Score

pal <- colorNumeric("viridis", reverse = TRUE, domain = leadrisk$leadriskscore_raw)
leaflet(leadrisk) %>% 
  addProviderTiles("CartoDB.Positron") %>% 
  addPolygons(data = leadrisk,
              fillColor = ~pal(leadriskscore_raw),
              weight = 1,
              opacity = 1,
              color = "white",
              fillOpacity = 0.6,
              highlight = highlightOptions(weight = 2, fillOpacity = 0.8, bringToFront = T),
              popup = paste0(leadrisk$NAME.y, "<br>",
                             "Raw Lead Risk Score: ", round(leadrisk$leadriskscore_raw, 2))) %>% 
  addLegend("bottomright", pal = pal, values = leadrisk$leadriskscore_raw,
            title = "Raw Lead Risk Score", opacity = 0.7)